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Abstract 

A new family of distances for graph vertices is proposed. These distances reduce to 
the shortest path distance and to the resistance distance at the extreme values of 
the family parameter. The most important property of them is that they are graph- 
geodetic: d(i, j) + d(j, k) = d(i, k) if and only if every path from i to k passes through j. 
The construction of the distances is based on the matrix forest theorem and the graph 



> 

| bottleneck inequality. 
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1 Introduction 

The classical distance for graph vertices is the shortest path distance [lj. Another distance, 
which is almost classical, is the resistance distance [12J also called the commute-time distance 

mm- 

The forest distances d a (i,j) [10J form a one-parametric family converging to the discrete 
distance (d (i,j) = 1 whenever vertices % and j are distinct) as a — > and becoming 
proportional to the resistance distance as a — > oo. The parameter a controls the relative 
influence of short and long paths between vertices on the distance between them. 

In a recent paper [16] (see also |14j). the authors construct a parametric class of graph dis- 
similarity measures whose extrema are the shortest path distance and the resistance distance. 
It is noteworthy that in clustering tasks, the best performance is obtained with intermediate 
values of the family parameter. At the same time, the corresponding intermediate measures 
need not be distances as they break the triangle inequality. 
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Thus, there is a demand in certain applications (these include data analysis, computer 
science, mathematical chemistry and some others) for a family of graph distances whose 
extreme properties are similar to those of the dissimilarity measures in [16J. Such a family 
is introduced in this paper. It is the family of logarithmically transformed forest distances, 
and its construction is based on the matrix forest theorem [8] and the graph bottleneck 
inequality [5]. The logarithmic transformation not only leads to the shortest path distance 
at a — > 0, but also, for every a > 0, ensures the remarkable graph- geodetic property: d(i,j) + 
d{j, k) = d(i, k) if and only if every path from i to k passes through j. 

We now introduce the necessary notation. Let G be a weighted multigraph (a weighted 
graph, where multiple edges are allowed) with vertex set V(G) = {l,...,n}, n > 1 and 
edge set E{G). We assume that G has no loops. For i,j G V(G), let G {0, 1, . . .} be 
the number of edges incident to both i and j in G; for every p G {1, . . . ,71^-}, wf, > is 
the weight of the pth edge of this type; let w^ = Ylp=i w ij (if n ij = 0; we se ^ w ij — 0) and 
W = (wij) nxn . W is the symmetric matrix of total edge weights of G. 

A rooted tree is a connected and acyclic weighted graph in which one vertex, called the 
root, is marked. A rooted forest is a graph all of whose connected components are rooted 
trees. The roots of those trees are, by definition, the roots of the rooted forest. 

By the weight of a weighted graph H, w(H), we mean the product of the weights of all 
its edges. If H has no edges, then w(H) = 1. The weight of a set S of graphs, w(S), is the 
total weight of the graphs belonging to S; the weight of the empty set is zero. If the weights 
of all edges are unity, i. e. the graphs in S are actually unweighted, then w(S) reduces to the 
cardinality of S. 

For a given weighted multigraph G, by T = F{G), Tij = Tij{G), and T-f = T^' \G) we 
denote the set of all spanning rooted forests of G, the set of all forests in JF that have vertex 
i belonging to a tree rooted at j, and the set of all forests in fcj that have exactly p edges. 
Let 

f = w(F), f ij = w(F ij ), and =w(jf), i,j eV(G), < p < n; (1) 

by F we denote the matrix (f%j) n xn'i F is called the matrix of forests of G. 
Let L = (£ij) be the Laplacian matrix of G, i. e., 

{-w i:j , j ^ i, 

Consider the matrix 

Q = (q lJ ) = (I + L)- 1 . 

By the matrix forest theorem [3 El E] , for any weighted multidigraph G, Q does exist and 

Qij = y , i,j = l,...,n. (2) 

Consequently, F = fQ = /•(/ + L)^ 1 holds. Q can be considered as a matrix providing a 
proximity (similarity) measure for the vertices of G [8j H] . 
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By d s (i,j) we denote the shortest path distance, i.e., the number of edges in a shortest 
path between % and j in G; by d r (i,j) we denote the resistance distance between % and j 
defined as follows: 

<r(i,j)=£$+Q j -2Q, (3) 

where (£^) nxn — L + is the Moore-Penrose generalized inverse of the Laplacian matrix L 
of G. If G is connected, then, due to [91 Proposition 8] 
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L + = (L + J) — J, (4) 
where J is the nxn matrix with all entries ~, and by P Theorem 3] 



An-2) _ 1 e( n -2) 
Q+ _ fjj nJ 

H ' J nt 



i,jeV(G) 



holds, where _p n 2 > is the total weight of spanning rooted forests with n — 2 edges and t is 
the total weight of spanning trees in G. By virtue of ([3]) and (jlj) this yields 

Corollary 1 (to Proposition 8 and Theorem 3 of [9]). If G is connected, then 

An-2) An-2) _ ^ An-2) 

d r (i,j) = x ii + x jj -2x ij = J -^ ^ — — , i,jeV{G), (5) 

where (x^) — (L + J) -1 . 

In Section [2] we introduce a new class of intrinsic graph distances and in Sections [3] and H] 
we study its properties. 

2 Logarithmic forest distances 

L<>1 Q a = (I + aL)-\ (6) 

where L is the Laplacian matrix of G, I the identity matrix, and a > a real parameter. 
Define the matrix H n as follows: 



H a = 7 (a - l)log Q Q a , (7) 



where a ^ 1, 7 is a positive factor, and <p(Q a ) stands for componentwise operations. Finally, 

consider , . , , , , . 

D a = l(h a l' + lti a )-H a , (8) 

where h a is the column vector containing the diagonal entries of H a , h' a is the transpose of 
h a , 1 and 1' being the column of n ones and its transpose. In Theorem [T] below we show 
that D a is a matrix of distances between the vertices of G. 

Since lim ((a — 1) / In a) = 1, we extend Eq. © to a = 1 as follows: 

a— +1 



Fi = 7ln<2, (9) 
which preserves continuity. This extension is assumed throughout the paper. 
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Theorem 1 For any connected multigraph G and any a, 7 > 0, D a = {dij(a)) nxn defined 
by (JSj) with extension fT9T) is a matrix of distances on V(G). 

Before proving Theorem [1] we represent the entries of the distance matrix D a in terms of 
the weights of spanning forests. 

For a > 0, let G a be the weighted multigraph resulting from G by multiplying the weights 
of all edges by a. Let 

f i:j (a) =w(Fij(G a )), i,j=l,...,n. (10) 

Proposition 1 For any connected multigraph G and any a,7>0, the matrix D a = (dij(a)) 
defined by (IS]) with extension ([9]) exists and 



^(«)= U ' 80 T , /../ 1 ><■ (11) 



Proof. Observe that aL is the Laplacian matrix of the weighted multigraph G a . Then 
applying the matrix forest theorem (TSJ) to G a one obtains that the matrix Q a exists and its 
entries are strictly positive, provided that G is connected. Therefore H a and D a also exist. 

Let H a = (hij(a)) and Q a = (gy-(ot)). For any positive a 7^ 1 and 7, equations (jSJ) to (jHJ) 
and the matrix forest theorem (jSJ) imply 

dij(a) = l(hu(a) + hjj(a)) - hij(a) 

= 7 (a - 1) [| (log a g«(a) + log a g#(a)) - log Q g -(a) 



= 7 (a-l)log a ^ — = 7 (a - l)log Q -^ — 

for every i,j — l,...,n. If a = 1, the required expression follows similarly using (Q. □ 

Proof of Theorem [H Proving the theorem amounts to showing that for every i,j,k G 
V(G): 

(i) dij(a) = if and only if % = j; 

(ii) dij(a) + dj k (a) — d k i{a) > (triangle inequality). 

Let If z = j then log a ^"^"^ ^ = log a l = 0, hence, by Proposition [U (a) = 0. 

Conversely, if dij(a) = 0, then by Proposition [U fa(a) fjj(oe) = (/^(a)) 2 holds. If i ^ j, then 
fij(a) < fjj(a), since, by definition, Tij{G a ) C !fjj{G a ) and FjjiGa) \Tij{G a ) contains the 
trivial spanning rooted forest having no edges and weight unity. Similarly, fji(a) < fu(a) 
and since is symmetric, fij(a) < fu(a). Consequently, i 7^ j contradicts the assumption 
dij(a) = 0, hence i = j. 

To prove (ii), observe that by ©, (jHJ), and (J2J), for any positive a 7^ 1 we have 

dij(a) + djjt(a) - d ki (a) = ~(hu(a) + hjj(a) + hjj(a) + h kk (a) - h kk (a) - hu(a)) 

- hij(a) - h jk (a) + h ki (a) 
= hjj(a) + h ki (a) - h^a) - h jk (a) 

= 7 (a - l)log a . (12) 

JijW Jjk{a) 

4 



From the symmetry of Q a and the graph bottleneck inequality [5, Corollary 1], 
fjj( a ) fki( a ) > fij( a ) fjk( a )- Therefore (fT2|) implies that dij(a) + dj k (a) — d ki (a) > 0. 
For a = 1 (i) and (ii) are proved similarly. □ 

Theorem [1] enables us to give the following definition. 

Definition 1 Suppose that G is a connected multigraph and a > 0. The logarithmic forest 
distance with parameter a on G is the function d a : V(G) x V(G) — > R such that d a (i,j) = 
dij(a), where D a = (dij(a)) is defined by (jHJ) with extension (J9j). 

In Definition [H the scaling factor 7 is regarded as an implicit, i. e., internal parameter of 
logarithmic forest distances. In Section [3} we show that all of them are graph-geodetic. In 
Section distances with a specific value of 7 will be considered. 

3 The logarithmic forest distances are graph-geodetic 

The key property of the logarithmic forest distances is that they are graph-geodetic. 

Definition 2 For a multigraph G, a function d : V(G) x V(G) — > R is graph- geodetic 
whenever for all k G V(G), d(i,j) + d(j, k) = d(i, k) holds if and only if every path from 
% to k passes through j. 

If d(i,j) is a distance for graph vertices, then the property of being graph-geodetic is 
a natural condition of strengthening the triangle inequality to equality. The shortest path 
distance clearly possesses the "if" part of the graph-geodetic property; this property of the 
resistance distance was proved in [12]. The ordinary distance in a Euclidean space satisfies 
a similar condition resulting by substituting "line segment" for "path." 

Theorem 2 For every connected multigraph G and every a > 0, the logarithmic forest 
distance d a (i,j) is graph- geodetic. 

Proof. By the graph bottleneck equality [HI Corollary 1] and the symmetry of F{a) = 
U'ij( a ))nxn, fjj(a) /fci(a) = fij(a) fjk(a) is true if and only if every path in G(a) from i to 
k passes through j. Owing to ( 1T21) and the analogous expression for a = 1, this equality is 
equivalent to d a (i,j) + d a (j, k) — d a (k,i) = 0. The desired statement follows. □ 

Graph-geodetic functions have many interesting properties. One of them, as mentioned 
in [12], is a simple connection (such as that obtained in [TT]) between the cofactors and 
determinant of GTs distance matrix and those of the strong blocks of G. Another example 
is the recursive Theorem 8 in [13]. Clearly, for a tree, all the n(n — l)/2 values of a graph- 
geodetic distance are determined by the n — 1 values corresponding to the pairs of adjacent 
vertices. The logarithmic forest distances, as well as the limiting shortest path and resistance 
distances, need not be Euclidean, however, by Blumenthal's "Square-Root" theorem, the 
corresponding "square-rooted" distances satisfy the 3- Euclidean condition (cf. [T3]). 

It can be observed that the "ordinary" forest distances [TO] generally are not graph- 
geodetic. The graph bottleneck inequality [5] underlying the proofs of Theorems [1] and [2] is 
actually a multiplicative counterpart of the triangle inequality for proximities [8]. 
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4 Asymptotic properties 

Consider the subclass of logarithmic forest distances with the scaling factor 

2 

7 = ln(e + a»). (13) 

We will prove that these generalize both the shortest path and the resistance distances. 

Proposition 2 For any connected multigraph G and every i,j G V(G), d a (i,j) with scaling 
factor (fT3|) converges to the shortest path distance d s (i,j) as a — * + . 

Proof. Denote by m the shortest path distance d s (i,j) between i and j ^ i. Observe that 
the weight of every forest that belongs to TaiGa) and has at least one edge vanishes with 
a. — > + , whereas TaiGa) contains one trivial forest without edges whose weight is unity. 
Taking this into account and using Proposition [T] and ([I]) one obtains 

lim d a (i,j) = lim I -log Q 



«-0+ aK,JJ a-*0+\ *<* a m( f p +o(1) ) 



where o(l) ^ as a ^ + . Consequently, 



lim d a (i,j) = lim (m + hg a f^) = m = d s (i,j). □ 

Proposition 3 For any connected multigraph G and every i,j G V(G), d a (i,j) with scaling 
factor ffT3]) converges to the resistance distance d r (i,j) as a — > oo. 

Proof. Observe that for every i,j G V(G), 4 n_1) is the total weight of all spanning trees 
in G. Denote it by t; since G is connected, t > 0. By Proposition [T] one has 



lim d a (i,j) = lim 



"- i (<+i/r 2) +^))«"- i ('+^r , +"a) 



— In a (In a) x ln 



where o(-) denotes expressions such that a-o(-) ^ as a oo. Hence 



»(n-2)\a / f(™-2)\Q / / „ (n-2) \ /A™-' 2 ) 

2 V V 1 + ^7 ( 1 + ^r-) 2 V exp ( ' ) exp ( * 

limd a (z,j) = - lim In , n _ 2)x „ = - In 



1 + ^) exp 

,(n-2) , f (n-2) _ 2f (n-2) 

nt ' { 1 

Consequently, by Corollary CD presented in Section [U lim a _ ) . 0O d a (z, j) = d r (i,j). □ 

For logarithmic forest distances with arbitrary positive scaling factors 7, "converges" in 
Propositions [2] and [3] must be replaced by "becomes proportional." 
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5 Concluding remarks 



On intercomponent distances 

Throughout the paper, we assumed that G is connected. Otherwise, if G has more than 
one component and % and j belong to different components, then, by the matrix forest 
theorem ([2]), q^ = = 0. Consequently, if log a (-) is considered as a function mapping to 
the extended line 1U {— oo, +oo}, then ([8]) leads to the distance +oo between i and j, which 
seems quite natural. 

On the parameter a and the length of paths between vertices 

The parameter a of logarithmic forest distances controls the relative influence of short, 
medium, and long paths between vertices i and j on the distance d a (i,j). As a — > 0, only 
the shortest paths matter; the long paths have the maximum effect as a — > oo. 

On the "mixture" of the shortest-path and resistance distances 

The simplest way of "generalizing" both the shortest-path and the resistance distances is 
to consider the convex combination of the form d' a (i,j) = ad r (i,j) + (1 — a)d s (i,j), where 
a G [0,1]. However, this family is quite poor from both theoretical and practical points 
of view. Let, for example, G be a path on four vertices: V(G) = {1,2,3,4} and E(G) = 
{(1,2), (2,3), (3,4)}. Then tf (1,2) = d s (2,3) = cf (1,2) = d r (2,3) = 1, and therefore 
o^(l,2) = d' a (2, 3) for all a G [0,1]. On the other hand, in applications, there are models 
and intuitive heuristics that result in either d(l,2) > d(2,3) or d(l,2) < d(2,3). Indeed, 
suppose that the distance d(i,j) should depend on the whole set of routes between i and j: 
the shorter and more numerous the routes, the smaller the distance. Then the inequality 
d(l,2) > d(2,3) is suggested by the observation that there are three routes of length 3 
between vertices 2 and 3 (namely, (2, 3, 2, 3), (2, 1, 2, 3), and (2, 3, 4, 3)) and only two routes of 
length 3 between vertices 1 and 2 ((1, 2, 1, 2) and (1, 2, 3, 2)). However, if the relative numbers 
of routes are important, then the opposite inequality d(l, 2) < d(2, 3) can be justified by the 
observation that (1,2) is the unique route of length 1 starting at vertex 1, whereas (2,3) 
and (3, 2) are not unique routes starting at vertices 2 and 3, respectively. It should be noted 
that the inequality d(l,2) < d(2, 3) holds true for the quasi- Euclidean graph distance [13]. 
The above example demonstrates that distances providing d(l, 2) = d(2, 3) are insufficient 
for the applications of graph theory. As regards the forest distances, the logarithmic forest 
distances provide d a (l,2) < d a (2,3), whereas with the "ordinary" forest distances [10], we 
have J a (l,2) > 4,(2,3). 

The shortest-path and resistance distances in the framework of forest distances 

In the view of H. Chen and F. Zhang [3], "...the shortest-path [distance] might be imagined 
to be more relevant when there is corpuscular communication (along edges) between two 
vertices, whereas the resistance distance might be imagined to be more relevant when the 
communication is wave-like." However, they do not develop this idea in depth. As has been 
shown in this paper, the shortest-path and resistance distances are two extreme examples of 
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the logarithmic forest distances. The forest distance between vertices i and j is interpreted 
as the probability of choosing a forest partition separating % and j in the model of random 
forest partitions of graph G [TQj, Proposition 5]. As a — > 0, transformation (EJ) preserves 
only those partitions that connect i and j by a shortest path and separate all other vertices; 
thereby the shortest path distance results, as we see in Proposition El As a — ► oo, this 
transformation preserves only the partitions determined by two disjoint trees, which leads 
to the resistance distance, as Proposition [3] states. 
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